Large-scale music similarity search with spatial trees
نویسندگان
چکیده
Many music information retrieval tasks require finding the nearest neighbors of a query item in a high-dimensional space. However, the complexity of computing nearest neighbors grows linearly with size of the database, making exact retrieval impractical for large databases. We investigate modern variants of the classical KD-tree algorithm, which efficiently index high-dimensional data by recursive spatial partitioning. Experiments on the Million Song Dataset demonstrate that content-based similarity search can be significantly accelerated by the use of spatial partitioning structures.
منابع مشابه
Learning Binary Codes For Efficient Large-Scale Music Similarity Search
Content-based music similarity estimation provides a way to find songs in the unpopular “long tail” of commercial catalogs. However, state-of-the-art music similarity measures are too slow to apply to large databases, as they are based on finding nearest neighbors among very high-dimensional or non-vector song representations that are difficult to index. In this work, we adopt recent machine le...
متن کاملA Tale of Two (Similar) Cities: Inferring City Similarity Through Geo-Spatial Query Log Analysis Submitted for Blind Review
Understanding the backgrounds and interest of the people who are consuming a piece of content, such as a news story, video, or music, is vital for the content producer as well the advertisers who rely on the content to provide a channel on which to advertise. We extend traditional search-engine query log analysis, which has primarily concentrated on analyzing either single or small groups of qu...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملEyes4Ears - More than a Classical Music Retrieval System
Content-based similarity search for music retrieval attracted a lot attention in recent information retrieval research. Most music applications (e.g. several commercial web portals) offer to search music files, which however is limited to key-word-based search on subjects like genre or artist. Other similarity search approaches base on abstract metrics, which are defined on feature vectors repr...
متن کاملEfficient multifeature index structures for music data retrieval
In this paper, we propose four index structures for music data retrieval. Based on suffix trees, we develop two index structures called Combined Suffix Tree and Independent Suffix Trees. These methods still show shortcomings for some search functions. Hence we develop another index, called Twin Suffix Trees, to overcome these problems. However, the Twin Suffix Trees lack of scalability when the...
متن کامل